Multimedia API Reference

September 12, 2016 | 24.2 Release

 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
4 Ways to Video Decode with GIE Processing Sample

Overview

This use case application showcases a typical appliance performing intelligent video analytics. Application areas include public safety, smart cities, and autonomous machines. This example demonstrates four (4) concurrent video streams going through a decoding process using the on chip decoders, video scaling using on chip scalar, and GPU compute. For simplicity of demonstration, only one of the channels will use GIE to perform object identification and generate bounding box around the identified object. This sample also uses video converter functions to do various format convrsions and uses EGLImage to demonstrate buffer sharing and image display capability.

In this sample, object detection is limited to identifying cars in video streams of 960 x 540 resolution, running up to 14 FPS. The network is based on GoogleNet. The inference is performed on a frame-by-frame basis and no object tracking is involved. Note that this network is intended to be purely an example to showcase GIE usage to build the compute pipeline quickly. The trained GoogleNet is provided as source and was trained using NVIDIA DIGITS with roughly 3000 frames taken from 5-10 feet elevation. Varying levels of detection accuracy are expected based on the video samples fed in. Given that this sample is locked to perform at half-HD resolutions under 10 FPS, video feeds with higher FPS for inference will show stuttering during playback.

NvEGLImageFromFd is NV-defined API to return an EGLImage pointer from the file descriptor buffer that is allocated by way of Tegra mechanism. EGLImage buffer is then used by GIE to render the bounding box to the image.

This sample does not require a camera.

The image below shows a sample block diagram.

The images below shows data flow details for the channel using GIE.

NvEGLImageFromFd is NVIDIA defined API to return an EGLImage pointer form the file descriptor buffer that is allocated by way of Tegra mechanism. EGLImage buffer is then used by GIE to render the bounding box to the image.

Prerequisites

Before running the sample, you must have the following:

  • CUDA 8.0 Toolkit for L4T Rel 24.2
  • GPU Inference Engine (GIE)
  • OpenCV4Tegra
  • README that provides details on the environment requirements to build and run the sample

Key Structure and Classes

There is a global structure struct context_t that manages all the resources in the application.

ElementDescription
NvVideoDecoderContains all video decoding-related elements and functions.
NvVideoConverterContains elements and functions for video format conversion.
NvEglRendererContains all EGL display rendering-related functions.
egl_imageThe EGLImage used for CUDA processing.
conv_output_plane_buf_queueOutput plane queue for video conversion.
dec_capture_loopThe thread handler for decoding capture loop.

NvVideoDecoder

The NvVideoDecoder class creates a new V4L2 Video Decoder. The following table describes the key NvVideoDecoder members that this sample uses.

MemberDescription
output_plane Holds the V4L2 output plane.
capture_planeHolds the V4L2 capture plane.
createVideoDecoder Static function to create video decode object.
subscribeEventSubscribes event.
setExtControls Sets external control to V4L2 device.
setOutputPlaneFormatSets output plane format.
setCapturePlaneFormatSets capture plane format.
getControlTBD
dqEventDequeues the devent reported by the V4L2 device.
isInErrorChecks if under error state.

NvVideoConverter

The NvVideoConverter class packages all video converting related elements and functions. It performs color space conversion, scaling and conversion between hardware buffer memory and software buffer memory. The following table describes the key NvVideoConverter members that this sample uses.

MemberDescription
output_plane Holds the output plane.
capture_planeHolds the capture plane.
waitForIdle Waits until all the buffers queued on the output plane are converted and dequeued from the capture plane. This is a blocking method.
setCapturePlaneFormatSets the format on the converter capture plane.
setOutputPlaneFormat Sets the format on the converter output plane.

Both NvVideoDecoder and NvVideoConverter contains 2 key elements: output_plane and capture_plane. This object is dervied from class type NvV4l2ElementPlane.

NvV4l2ElementPlane

NvV4l2ElementPlane creates an NVv4l2Element plane. The following table describes the key NvV4l2ElementPlane members used in this sample.

Member Description
setupPlane Sets up the plane of V4l2 element.
deinitPlane Destroys the plane of V4l2 element.
setStreamStatus Starts/Stops the stream.
setDQThreadCallbackSets the callback function of the dqueue buffer thread.
startDQThread Starts the thread of the dqueue buffer.
stopDQThread Stops the thread of the dqueue buffer.
qBuffer Queues a V4l2 buffer from the plane.
dqBuffer Dequeues a V4l2 buffer from the plane.
getNumBuffers Gets the number of the V4l2 buffer.
getNumQueuedBuffers Gets the number of the V4l2 buffer in the queue.
getNthBuffer Gets the NvBuffer queue object at index N.

GIE_Context

GIE_Context provides a series of interfaces to load Caffe model and perform inference. The following table describes the key GIE_Context members used in this sample.

GIE_ContextDescription
gie_allocSets up all resources and parameters.
gie_freeClears all resources.
caffeToGIEModelLoads Caffe mode to GIE model.
doInferenceInterface for inference after GIE model is loaded.
parse_bboxParses Google net's output results.
parse_hel_bboxParses Nvhel net's output results.
parse_netParses net info from proto file.

2 global functions are used to create and destroy EGLImage from dmabuf file descriptor.

Global FunctionDescription
NvEGLImageFromFdCreates EGLImage from dmabuf fd.
NvDestroyEGLImageDestroys the EGLImage.

Command Line Options

To run the sample, execute:

backend <channel-num> <in-file1> <in-file2>... <in-format> [options]

The following video formats are supported for use with command line options:

  • H.264
  • H.265

Options

The table below describes the available options.

CommandDescription
-h, –helpPrints the help text.
-fps <fps>Displays rate in frames per second. Default = 30.
–sProvides a statistic of each channel.
-run-opt <0-3>0=default, 1=parser only, 2=parser + decoder, 3=parser + decoder + VIC
–input-naluSpecifies that the default input to the decoder is in nalu units.
–input-chunksSpecifies that the default input to the decoder is in chunk of bytes.
–gie-deployfileSets the deploy file name.
–gie-modelfileSets the model file name.
–gie-proc-intervalSets the process interval; 1 frame is processed every gie-proc-interval.
–gie-float160=default, 1=float16, 2=float32.

For X11 technical details, see:

http://www.x.org/docs/X11/xlib.pdf